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Abstract. Reduced basis methods are projection-based model order reduction techniques 
for reducing the computational complexity of solving parametrized partial differential equation 
problems. In this work we discuss the design of pyMOR. a freely available software library of 
model order reduction algorithms, in particular reduced basis methods, implemented with the 
Python programming language. As its main design feature, all reduction algorithms in pyMOR are 
implemented generically via operations on well-defined vector array, operator and discretization 
interface classes. This allows for an easy integration with existing open-source high-performance 
partial differential equation solvers without adding any model reduction specific code to these 
solvers. Besides an in-depth discussion of pyMOR’s design philosophy and architecture, we present 
several benchmark results and numerical examples showing the feasibility of our approach. 
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1. Introduction. Over the past years, model order reduction methods have become 
an important part of many numerical simulation workflows for handling large-scale ap¬ 
plication problems. Reduced basis (RB) methods are a popular family of such reduction 
techniques, applicable to parametrized high-dimensional models described by partial dif¬ 
ferential equations (PDEs). The main ingredient of RB methods is a Galerkin projection 
of the differential equation onto a problem-adapted reduced subspace generated from 
solution snapshots of a high-dimensional approximation of the problem for certain well- 
chosen sampling parameters. While the high-dimensional approximation using standard 
discretization techniques (such as finite element methods) often yields discrete function 
spaces with millions of degrees of freedoms, the reduced spaces generated by RB methods 
typically are of order 100 or smaller, while still retaining the same approximation quality 
for the problem at hand as the high-dimensional space. In practice, model order reduc¬ 
tion by RB approximation can lead to speedups of up to several orders of magnitude. 
By now, a large body of literature has emerged which theoretically proves and practi¬ 
cally demonstrates the applicability of the RB approach to a large variety of application 
problems (see, e.g., the recent monographs [18| 130) . the tutorial m, and the references 
therein). 
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Despite the popularity of RB methods, only few software implementations have been 
discussed in the literature so far. We are only aware of publications discussing rbMIT |28| . 
RB modules for libMesh |21| and feel++ |in| . as well as the combination of RBmatlab 
with dune-rb |12| . However, these approaches are either too simplistic to be applied 
to large-scale problems (rbMIT) or offer limited code re-usability by being tied to a 
specific PDE solver ecosystem. While RBmatlab defines interfaces for the integration 
with dune-rb which would allow to reuse its algorithms in conjunction with other PDE 
solvers implementing the same interfaces, code re-usability is still limited by the fact that 
RBmatlab requires all parts of the reduction algorithm involving high-dimensional data 
to be implemented by the PDE solver. 

In this article we discuss the design of pyMOR: an open-source. Python-based model 
reduction library which is being developed as part of the MULTIBAT research project 
(cf. |25| for a brief overview on pyMOR in the context of Multibat). Since one of the 
goals of said project is to use RB techniques for the reduction of microscale battery 
models implemented independently by another group participating in the project, an 
easy integration with third-party PDE solvers is a central design goal for pyMOR. 

To allow easy integration with external solvers, pyMOR is built around a small set of 
interface classes for interaction with the solver. Unlike the approach taken by RBmatlab, 
pyMOR’s interface classes are designed for lower level communication by directly repre¬ 
senting the high-dimensional vector and operator objects inside the solver. This allows 
to implement model order reduction schemes completely within pyMOR as generic algo¬ 
rithms operating on these interface classes. A new external solver is integrated simply 
by making the solver’s data structures available via a public interface to which pyMOR 
can connect to. No model reduction specific code has to be added to the solver. pyMOR’s 
interface design not only facilitates collaboration between researchers by decoupling PDE 
solver and RB algorithm development. It also fosters the evaluation and adoption of new 
RB techniques, since algorithms implemented with pyMOR can be tested more easily with 
problems which have not been considered by the original author. By now, pyMOR was 
used successfully in laiMllHlEl]. 

A design approach similar to pyMOR has been taken by the modred |5] package, a 
software library implementing modal decomposition algorithms which operate on generic 
vector objects provided by an external source. Due to the algorithmic requirements of RB 
methods, however, our approach goes further by also allowing operations on the solver’s 
system matrices or (nonlinear) operators, resulting in a deeper integration between the 
two software components. 

Recently, RBniCS was introduced [T], a Python-based RB library built on top of the 
FEniCS |22| PDE solver library. Similar to pyMOR, RBniCS allows easy development of RB 
applications in Python while leveraging FEniCS as a high-performance solver. However, 
in contrast to pyMOR, RBniCS is tied to the FEniCS ecosystem and cannot be integrated 
with other PDE solvers. redbKIT, which has been developed as companion software to 
the recent monograph |30| . follows a design similar to rbMIT. Apart from RB libraries, RB 
methods are used nowadays in several specialized simulation softwares such as NiftySim 
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|19) or the commercial code Akselo^ However, these implementations are restricted to 
their specific application domain. 

This article is organized as follows: Section contains a brief introduction to the 
RB methodology and provides the mathematical background needed to follow the tech¬ 
nical discussions in the subsequent sections. We discuss pyMOR’s design from a bird’s 
eye perspective and compare it to other design approaches in Section In Section 
we cover pyMOR’s interface classes in more detail, discuss important implementational 
aspects and give a basic example, how external solvers can be integrated with pyMOR. 
Moreover, we discuss the parallelization of pyMOR’s reduction algorithms. In Section 
we finally present technical benchmarks and a more advanced numerical example which 
demonstrate the performance of our software design and its applicability to large-scale 
problems. 


2. The reduced basis method. In this section we give a very short introduction to 
the RB methodology that will hopefully give the reader sufficient background to under¬ 
stand the discussion of our software design and our numerical experiments. We mainly 
focus on the basic class of linear, coercive, affinely decomposed problems for which the 
fundamental ideas of the approach can be most clearly described. A few of the many 
extensions of the methodology are discussed in Section |2.3| For a more detailed intro¬ 
duction to RB methods we refer to [l5l [HI [30] • 


2.1. Linear, coercive, affinely decomposed problems. We consider parametrized 
linear, strongly elliptic problems in weak form. More abstractly, we search for solutions 
Ufj, € V in some Hilbert space V satisfying 

t) = F{t) G V. (1) 

Here, for each parameter /j, contained in some parameter space n € V, is a continuous, 
coercive bilinear form and T is a continuous linear functional on V. Due to the Lax- 
Milgram theorem, Q admits a unique solution for every /r. Moreover, we assume that 
is affinely decomposed, i.e., there are Q G N bilinear forms Ri,..., Bq : R x R — )> M, 
and mappings 6i,... ,6q : V —> M, such that 

Q 

B^ = ^eg{fi)B, V/iGiP. (2) 

q=l 

Our goal is now to be able to quickly find approximations for arbitrary // G R, 
assuming we are allowed to compute a few selected during a preceding offline phase. 
The latter is in practice achieved by assuming that [^is already the result of an appropriate 
high-dimensional discretization of the original analytical equation, which, however, can 
only be solved with large computational effort. 

‘http://www.akselos.com 


3 



R. MILK, S. RAVE AND F. SCHINDLER 


The basic idea of the RB method is to first hnd appropriate ^i,..., ^]\f G V during 
the offline phase such that Vj\f := spanju^^,..., is a good approximation space for 
the so called solution manifold Ai := | /r G V}. Then, an approximation G Vn 

of Uf^ is obtained during the online phase by Galerkin projection of Q, i.e. UN,fi satisfies 


= F{(p) \/ip G Vn- (3) 

Again, ([^ has a unique solution according to the Lax-Milgram theorem. Assuming, 
moreover, we have bounds 

a^\\ipf < \/ip£V, \\B^\\<'y^ V/r G P, (4) 


Cea’s lemma yields the a-priori quasi-best approximation bound 


1% “ ^ — ™f WUfj, - 


(5) 


To solve ^ numerically, we choose a basis bi,... ,b]\f of Vn and let ^ the 

coefficient vector of un,ij, with respect to this basis, i.e. 


N 

UN,)! = 'UiN.ii.rfin- ( 6 ) 

72=1 


With Bfj,{bj,bi), — Fibi), Un,^j. i® then the solution of the linear equation 

system = Kn- Due to (2|, we moreover have 

where denote the matrices of I^. Thus, (j^ can be assembled and solved with 

0{QN‘^ + N^) operations, independently of dim!/. With typical basis sizes of Ai ~ 100, 
@ can then be solved in less than a millisecond on current hardware. 


The coefficients usually are not of interest in themselves. However, if the reduced 
basis is available, we can reconstruct the reduced solution un,^ as an element of the 
function space V using ([^. Moreover, one is often only interested in certain quantities of 
interest which are derived from the solution u^. If these quantities are linear functionals 
of u^, we can again pre-compute their evaluations on bi,... ,b]sf to arrive at a reduced 
model which can produce these quantities completely independent of dim V. 


Finally, we have the following standard residual-based a posteriori bound for the re¬ 
duction error available 


(uTVj/i) II — 1 

'In 




(7) 


where the residual TZ^ : V —> V' is defined by TZ^[un,^{^) '.= F{ip) — B^{uN,^, v)- Since 
we assume V to be finite-dimensional, ||7^^(uAr^^)||_i can be computed as the norm of 
the Riesz representative of given by application of the inverse inner product 

matrix to the discrete residual vector. Using Q, this computation can be reduced to 
0{Q‘^N‘^) online operations. For cases where a stability estimate is not known a priori. 
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Algorithm 1: Greedy basis generation with error estimator 
Input: training set Strain U V, tolerance e, max. dimension Nmax 
Output: reduced spaces Vi,, Vjsr 
N i — 0, Vq i — {d}; LTD i — ReduceCI^) 

while N < Nmax and ErrEst(RBSolve LTD), fj,, LTD) > e do 

I N ^ N + l 

1 /i* argmax^g^^^^,^ ErrEst(RBSolve(/i, DD), DD) 

2 Ufj,* ■(— Solve (/i*) 

3 I Vat •(— span(VAf-i U {%*}), DD ir- Reduce (Vat) 
end 


several algorithms have been developed to efficiently compute during the online phase 
(cf. |30[ Section 3.7] and references therein). 

2.2. Basis construction. As visible from the reduced solution UAr,/i of (j^ is a 
quasi-optimal approximation of within the given reduced space Vat. We are therefore 
interested in finding spaces Vm minimizing the maximum best approximation error for 
Ufa over all fi € V. A lower bound for what can be achieved is given by the Kolmogorov 
A-width of the solution manifold At, defined by 

dAr(At) := inf sup inf ||tt —r;||. (8) 

VnQV ugM I'SViv 
dim Vn'^N 

For the problem class we consider, it is known that the A-widths fall at least sub- 
exponentially fast, i.e., there are constants M,a,a > 0 s.t. 

dN{M) < Me-^^". (9) 

Thus, good approximation spaces Vn do exist. However, it is usually impossible to find 
spaces for which this lower bound disf{Ai) is attained. 

Reduced basis methods often employ greedy algorithms in order to obtain neaiiy-best 
approximation spaces Vn. A standard approach is the error estimator controlled greedy 
search defined in Algorithm[^ This algorithm assumes the existence of methods RBSolve 
and ErrEst for solving the reduced problem and estimating the reduction error given the 
required reduced model data LTD which is available through the Reduce method. In 
each iteration of the algorithm, the maximum reduction error over a finite training set 
Strain T V oI parameters is estimated and a high-dimensional solution snapshot for 
the parameter fi* maximizing the error estimates is computed. This snapshot is then 
used to extend the reduced space Vat. 

It is important to note that in Algorithm only lines 2 and 3 of the main loop in¬ 
volve high-dimensional operations and that these operations are performed only once 
per extension step. This allows to afford large training sets Strain, densely sampling the 
parameter space V. 
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If the error estimator used in Algorithm is rigorous and effective Q, this algorithm 
is a weak greedy algorithm in the sense of [6j. As a consequence im , Q carries over to 
the resulting spaces Vjsf in the sense that there are M',a' > 0 only depending on M, a 
and sup^^-p'f^/a^, s.t. 

sup inf ||n^ — n|| < M'e““(10) 

IJ-eStrain 

In conclusion, if we assume that our training set Strain is chosen dense enough, we will 
observe a (sub-)exponentially fast decay of the maximum reduction error, leading to large 
computational savings for many application problems. 

2.3. Extensions. The presented basic RB methodology can be extended in many di¬ 
rections, making it suitable for a wide range of application problems. We briefly cover 
three extensions which are relevant for our numerical examples. 


2.3.1. Proper orthogonal decomposition. As an alternative to a greedy construc¬ 
tion of V/v (Algorithm Q, we can perform a ‘proper orthogonal decomposition’ (POD) 
of a given training set of solution snapshots C P to obtain a set of or¬ 

thogonal basis vectors which describe the ‘main directions’ in V by which the snapshot 
set is characterized. More precisely, let —?■ V be the linear mapping sending the 

fe-th canonical basis vector of to Then the A:-th POD mode of the training set is 
defined to be the k-th left-singular vector of the singular value decomposition of The 
spaces V]\f^pod spanned by these vectors are /^-optimal in the sense that 


K 

E 


inf 


_^ '^^^N,pod 


u 




— f 


K 


min > inf 

wcy ^v&Vm 

dimVN<N 



( 11 ) 


POD is therefore a good option if this notion of optimality is desired (and not the 
/“-optimality the greedy approach is designed for). However, POD quickly becomes 
prohibitively expensive when a large training set is needed to approximate the solution 
manifold, since the full solution snapshot has to be computed for each parameter from 
the training set. 


2.3.2. Instationary problems. The RB methodology can be extended to instationary 
problems of the form 


{dtUp{t), ip) + p) = F{p) \/p e V, 

Up(0) = 


( 12 ) 


in a straightforward way. The most common approach is to perform a Galerkin projection 
of (121 onto a reduced space Vm C 1/ to arrive at an ordinary differential equation system 
of dimension N (method of lines). 
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Note, however, that solution snapshots are whole trajectories now, so it is no longer 
obvious how to extend Vjsf after a parameter fj,* has been selected in Algorithm A well- 
established approach with proven quasi-optimality is the Pod-Greedy algorithm m, 
which performs a POD of the orthogonal projection error trajectory {t)—Pvj^ {u^* (f)), 
of which the first modes are then added to Vjsf. 


2.3.3. Empirical interpolation. If does not satisfy (|y or even is nonlinear in the 
first variable, the standard RB approach is no longer efficient: while (|^ is still posed 
on the low-dimensional space Vjv, we cannot solve (|^ online without resorting to high¬ 
dimensional computations which will deteriorate most savings in computation time. 

A very generic approach to overcome this issue is empirical operator interpolation: 
given xi,...,xm £ V, ci,...,cm £ U' such that the matrix := {cj{xi))f^j^^ is 
non-singular, we approximate by the interpolated form Xm{B^) uniquely defined by 
TM{Bf,){v, •) £ spanjci,..., cm} and 

TM{Bf,){v,Xi) = Bi,{v,Xi), i = (13) 


for all V £ V. One easily checks that 

Tm{B^){v,-) = [ci,...,cm] -Im • [B^{v,xi),...,B^{v,xm)]'^- (14) 


^ is the finite element discretization of a partial differential operator, the evaluations 
-’/liv, xm) can be computed quickly and independently of dim V by know- 


In many cases, for instance when the Xi are chosen from a finite element basis and 
B^ 

B^{v,xi), ... ,5 

ing only M' < C ■ M degrees of freedom of v for a fixed constant C (e.g. depending on 
the stencil of the discretization). 

Thus, if we approximate B^ by Xm{B^) in ^ and substitute (14), we obtain 


[ci(v9),... ,cm((/^)] -Lm • [B|,iuN,^„Xl),.. .,B^{UN,^,,XM)]'^ = F{ip), (15) 


for all if £ Vjsf, which can be solved dimP-independently by pre-computing Cj{hn) and 
only storing the coefficients of bn which are required for the evaluation of B^{v,Xm)- 
The collateral interpolation basis ci,... ,cm, along with the interpolation points xi, 
..., xM) is obtained during the offline phase from a greedy search (Ei-Greedy, [T7]) or 
POD (DEIM, [9]) on an appropriate training set of evaluations of B^. 


3. Design of reduced basis software. In this section we discuss the software design 
issues which arise when implementing RB schemes and cover typical design approaches. 
We then present the approach taken by pyMOR and compare it to these standard designs. 

3.1. Required high-dimensional operations. The RB method is by nature a very 
generic model reduction framework which can be applied to a wide range of problems. 
It is an important feature of RB methods that existing high-dimensional discretizations 
can be used as they are in order to derive a reduced order model: any discretization. 
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no matter if, e.g., continuous finite elements, discontinuous Galerkin or meshless, can be 
used as long as the high-dimensional problem is of an appropriate form, such as Q or 

(T|). 

However, while this is true for the mathematical formulation, the code implementing 
the high-dimensional discretization nearly always has to be adapted for model order 
reduction: 


While the main purpose of the solver already is the computation of solution snapshots 
Ufj,, it is often not known a priori (cf. Algorithm]^ for which parameters /r the solution 
is required. Thus, either the high-dimensional solver has to stay initialized in memory 
during the whole offline phase, or the solver has to be re-initialized (mesh generation, 
matrix assembly, etc.) for each new solution snapshot. 


To ensure the numerical stability of the reduced model, the new solution has to be 
orthonormalized w.r.t. to the existing reduced basis, e.g. using a stabilized Gram-Schmidt 
algorithm, before it can be added to the reduced basis. For instationary problems (12), 
the solution trajectory has to be orthogonally projected onto the current reduced space 
Vjv and a POD of the projection errors has to be computed. 

The assembly of reduced system matrices for ^ requires the evaluation of and F for 
each (combination of) basis vector(s) of Vat. Moreover, the affine decomposition (|^ needs 
to be taken into account for offfine/online decomposition. The assembly of the reduction 
error estimator requires the computation of Riesz representatives for all residual parts 
(|^ and all inner products between them. An alternative approach with higher numerical 
stability [7] requires the computation of an orthonormal basis for the span of the Riesz 
representatives and the computation of the coefficients of the representatives with respect 
to the basis. For the reduction of non-affinely decomposed or nonlinear problems using 
empirical operator interpolation (Section 2.3.3), has to be evaluated on appropriate 
the collateral basis Cj and interpolation points Xi (14) have to be determined using 


u 


At) 


the Ei-Greedy or DEIM algorithm and the corresponding reduced data has to be 
computed. 


Finally, if access to the reduced solutions as elements of V is desired (e.g. for visual¬ 
ization), the solver needs to be invoked to perform the reconstruction (|^. 


3.2. Required low-dimensional operations. In order to solve the reduced problem 
(|^, the same type of algorithms (linear solvers, time-stepping, Newton algorithm) are 
required as for the solution of (Q. However, the involved data structures will be dif¬ 
ferent: dense instead of sparse matrices, no or shared memory parallelization instead of 
distributed memory parallelization. 

For reduced problems involving empirical operator interpolation, the restricted evalu¬ 
ation of according to ( |15[ ), is required. 

Lastly, the solution of (|^ is already required in the offline phase during greedy basis 
generation which, in case of adaptive variants [16] of Algorithm]^ may require substantial 
implementation work in itself. 
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3.3. Classical design approaches. Reduced basis implementations can be mainly 
categorized by how they realize the interaction between high- and low-dimensional oper¬ 
ations. All implementations we are aware of fall into one of the following three categories: 

Approach 1: Separate software. The whole RB scheme is implemented as a single 
software package which is able to read and process high-dimensional data produced by 
some external or self-written high-dimensional solver. All high-dimensional operations 
(Section |3.1[ ) within the offline phase are carried out by the RB software, with the possible 
exemption of the solution snapshot computation which may be performed by the solver. 
A typical examples is the rbMIT |28) package for Matlab. 

The main benefit of this approach is its simplicity: a single self-contained software 
package can be developed and maintained by experts for model order reduction in a 
programming language ecosystem of their choice. Interfacing external high-dimensional 
solvers is simple, only export of system matrices and (solution) vectors has to be imple¬ 
mented on the solver side. 

Consequently, the RB software has to be able to work with the high-dimensional data 
produced by the solver. E.g., the given sparse matrix format has to be supported and an 
adequate linear solver for the computation of Riesz representatives needs to be available. 
While this may be easily possible for small to medium sized discretizations using the 
tools provided by numerics packages such as Matlab, the limitations of this approach 
become obvious when we think of large-scale memory distributed problems solved on 
computer clusters where, for instance, a single system matrix for the whole problem 
is never assembled. By design, this approach also cannot be used in conjunction with 
matrix-free solvers. 

Handling of empirical interpolation is problematic, as well: either, for each model the 
exact same (with correct ordering of degrees of freedom) has to be re-implemented, 
or has to be evaluated by the external solver. This, however, does not seem to fit the 
paradigm of this approach very well. 

Approach 2: Inside high-dimensional solver. The complete reduction process is carried 
out by the high-dimensional solver which has been extended by an RB module. Examples 
are the RB implementations of the libMesh |21) and f eel++ m PDE solver packages. 

This approach offers the tightest possible integration between the high-dimensional 
model and the RB code, allowing maximum performance and easy development of ad¬ 
vanced reduction techniques which might require special operations on the high-dimen¬ 
sional data. Also, there cannot be any interoperability issues between different versions 
of the model reduction and the solver code. 

As a downside, implemented algorithms can only be used within the chosen software 
ecosystem. Given the large number of available PDE solver libraries, this vastly di¬ 
minishes the reusability of the code and ultimately hinders collaboration. Moreover, 
the implementor is required to have a good understanding of the inner workings of the 
PDE solver library, which is typically written in a system language such as C++. Many 
researchers working on model order reduction do not have such technical knowledge, 
however. Consequently many new methods are often only evaluated for ad hoc imple¬ 
mentations of academic ‘toy problems’. 
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Approach 3: Separate low- and high-dimensional operations. As a compromise between 
the aforementioned approaches, all low-dimensional operations are implemented in a ded¬ 
icated reduced basis software which communicates over well-defined interfaces with the 
PDE solver carrying out the high-dimensional operations (Section 3.1). This approach 
has been pursued by the integration of RBmatlab with the dune-rb module of the DUNE 
numerics environment |12) . 

This approach shares many benefits of the previous two approaches. The main RB logic 
can be developed independently from the solver in a programming language of choice and 
can be reused with any external solver implementing the necessary interfaces. All high¬ 
dimensional operations are performed by optimized solver code. Memory distributed or 
matrix-free implementations can be easily utilized. 

However, extending the PDE solver to perform all high-dimensional model reduction 
operations (Section 3.1) still requires substantial work. A change of the reduction algo¬ 
rithm will usually also require modification of the RB code in the solver. This can be 
a major issue when both components are developed by separate teams, in particular for 
research projects where these components are under constant change. 


3.4. Design of pyMOR. Our design is based on the central observation that all high¬ 
dimensional operations in RB schemes (Section |3.1[ ) can be expressed using only a small 
set of basic operations on operators and (collections of) vectors which are independent 
from the concrete reduction scheme in use. This lead us to the following basic design 
paradigm for pyMOR: 

1. Define model reduction agnostic interfaces for the mathematical objects involved 
with RB methods and related schemes. 

2. Generically implement all algorithms through operations provided by these inter¬ 
faces. 

As with design approach 1, pyMOR offers all model reduction code in a single self- 
contained software package. Since pyMOR provides its own basic discretization toolkit, it 
can be used completely on its own to easily test new reduction algorithms. The data 
types provided by the toolkit can also be used to import high-dimensional data from disk 
which has been produced by an external solver. This allows the use pyMOR’s algorithms 
in a workflow similar to design approach 1. Once a new model reduction algorithm has 
been developed, pyMOR’s interfaces allow to easily apply the exact same algorithm to high- 
fidelity models implemented in a high-performance solver running on a large computer 
cluster. 

Integrating a new external solver will usually require additional work on the solver 
side. However, since pyMOR’s interfaces work on a lower, model reduction agnostic level 
compared to design approach 3, these modification can be made mostly independently 
from the development of new model reduction algorithms. Also, our approach offers 
higher code reusability and maintainability since the complete reduction algorithm can be 
implemented in a single software library. This is particularly important when the model 
reduction library shall be used in conjunction with different PDE solver ecosystems. 
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pyMOR’s interfaces correspond to data structures which are already present in most 
PDE solver designs: operators and vectors. Thus, implementing pyMOR’s interfaces is 
basically equivalent to exposing the solver’s internal data structures via a public API. 
Refactoring a PDE solver to offer such an API generally empowers the user to easily use 
and extend the solver in new ways which have not been envisioned by the developers. 
E.g., pyMOR implements time-stepping algorithms which can be used to easily manufacture 
instationary discretizations out of stationary discretizations (see Section 5.2). While 
time-stepping schemes are clearly not a focus for pyMOR’s development, a software library 
of advanced time-stepping algorithms could use such an API to allow easy testing of 
these algorithms with a given discretization, after which a selected algorithm might 
be implemented in the solver for maximum performance. Moreover, a public API also 
allows interactive control of the solver, especially when used in conjunction with dynamic 
languages such as Python. Such interactive sessions can be a powerful tool for debugging, 
allowing to inspect and modify the solvers state in ways not possible with a classical 
debugger. 

Note that design approach 3 and pyMOR’s design strongly differ in the view on the 
relationship between the model reduction software and the high-dimensional solver: while 
approach 3 advocates a strong separation between low- and high-dimensional operations, 
we think of both components as strongly intertwined. E.g., with pyMOR it is easily possible 
to perform preliminary analyses of reduced models for which offline/online decomposition 
is not yet available. 

While design approaches 2 and 3 allow to perform all high-dimensional reduction op¬ 
erations with the maximum efficiency the PDE solver has to offer, it is clear that pyMOR’s 
interface design comes at a price of sub-optimal performance: every call of an inter¬ 
face method incurs a certain overhead, compiler optimizations may be hindered and the 
restriction to the available interface methods may prevent implementations which opti¬ 
mally exploit the given data structures. To asses the possible loss in performance, we 
have conducted several performance benchmarks (Section |5.1[ ) which show that, while a 
certain overhead exists, it becomes negligible for large problems. 


4. Model Reduction with pyMOR. In this section we present pyMOR’s interfaces in 
more detail (Section |4.1[ ) and discuss how the integration of external solvers via pyMOR’s 
interfaces can be technically realized (Section |4.2| ). A detailed example for the reduction 
of models implemented with the FEniCS, deal. II and DUNE solver libraries is presented 
in Section 4.3 Finally, we cover the parallelization of pyMOR’s reduction algorithms in 


Section 14.4 


4.1. pyMOR’s Interface Classes. From a bird’s eye perspective, pyMOR can be seen as 
a collection of generic algorithms operating on VectorArray, Operator and Discreti¬ 
zation objects which implement the interface methods that are defined by the abstract 
VectorArray Interface, Operator Interface and Discretization base classesj^ To in- 

Mhe full documentation of all interface methods is shipped with pyMOR and is available online at 
http://docs.pymor.org/ 
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Figure 1: Schematic view of a typical pyMOR application displaying the interaction be¬ 
tween the user code, external PDF solvers and several parts of pyMOR (see 
Section |4.3|) . 


tegrate pyMOR with an external high-dimensional solver, wrapper classes for these types 
representing the corresponding high-dimensional data inside the solver have to be imple¬ 
mented (cf. Figure [^. 

VectorArrays are ordered collections of vectors of same dimension, on which standard 
linear algebra operations such as linear combination (lincomb) or inner products (dot, 
gramian) can be performed. Selected degrees of freedom can be extracted (components), 
as required for empirical interpolation. All Vector Array methods can either act on single 
vectors or the whole array. Choosing arrays of vectors as elementary objects in pyMOR (as 
opposed to single vectors) not only allows to express many model reduction operations in 
a concise manner, it also allows implementations which optimally exploit the vectorized 
structure of the instructions for performance. An example of such an implementation is 
pyMOR’s NumpyVectorArray (cf. Section 5.1). Lists of single vector objects inside external 
solvers can be easily managed using the ListVectorArray class. 

Operators, which in pyMOR represent parametric matrices, bilinear forms, inner prod¬ 
ucts, functionals or nonlinear operators, can be applied to VectorArrays, yielding a new 
VectorArray of results (apply). Access to (linear) solvers is provided via apply_inverse, 
the Jacobian of an Operator is obtained as a new Operator via the Jacobian method. 
Affinely decomposed ([^ operators are represented as LincombOperators which hold lists 
of the Operators Bq and the ParameterFunctionals 6q. LincombOperators can con¬ 
tain other LincombOperators as summands, allowing an easy construction of arbitrarily 
nested affine decompositions, which are automatically handled by pyMOR’s reduction al¬ 
gorithms. 

Finally, Discretizations act as structured containers for Operators. For instance. 
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from pymor.basic import * 
from functools import partial 

d, prod = pymor_model0 

coerc_est = ExpressionParameterFunctional( ’min(diffusion)’ , d.parameter_type) 
reductor = partial(reduce_stationary_coercive, 

error_product=prod, coercivity_estimator=coerc_est) 
rd, rc = reduce_naive(d, reductor, 10) 


res = reduction_error_analysis(rd, d, rc, 

error_norms= [induced_norm(prod)], 


print (res [ ’summary’ ]) 


random_seed=77) 


Listing 1: Main script of example.py (simplified). 


StationaryDiscretization has an operator and rhs attribute. Moreover, Discreti¬ 
zations can be solved for a given parameter, returning a solution VectorArray (solve) 
which might be visualized using the visualize method. In StationaryDiscretization, 
solve for a given parameter mu is implemented generically by calling self. operator. 
apply_inverse(self.rhs.as_vector(mu), mu=mu). Other Discretizations might call 
optimized solution algorithms of an external solver. 

It is an important property of pyMOR’s interfaces that each method either returns 
low-dimensional data or new VectorArray, Operator or Discretization objects. This 
ensures that no high-dimensional data ever has to be communicated between the external 
solver and pyMOR and that no code for handling the solver-specific high-dimensional data 
structures has to be added to pyMOR. 

Note that not only the high-dimensional model but also the reduced low-dimensio¬ 
nal model is represented by VectorArrays, Operators and Discretizations, imple¬ 
mented inside pyMOR. This allows to use all algorithms in pyMOR with both high- and 
low-dimensional objects. For instance, the reduced model could be interpreted again as 
the high-dimensional model for an additional reduction step. 

4.2. Implementation. pyMOR is implemented with Python, a managed, dynamically 
typed programming language, which is easy to pick up (even for inexperienced program¬ 
mers), allowing quick and easy prototyping algorithms. 

In contrast to MATLAB, Python does not have copy-on-write semantics for assignment 
which, while allowing more precise control over data, often raises the issue of object 
ownership. To alleviate the novice user from having to care too much about ownership, 
pyMOR enforces immutability on all Discretizations and Operators. In combination 
with Python’s dynamic memory management, this makes the question of ownership ir¬ 
relevant for these objects. For VectorArrays, we have implemented shallow-copy/deep- 
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def pymor_model 0: 

problem = ThermalBlockProblem(num_blocks=(2, 2)) 

d, _ = discretize_elliptic_cg(problem, diameter=l./lOO.) 

return d, d.hl_0_semi_product 

def reduce_naive(d, reductor, rb_size): 

training_set = d.parameter_space.sample_randomly(rb_size) 

basis = d.operator.source.empty 0 
for mu in training_set: 

basis.append(d.solve(mu)) 

rd, rc, _ = reductor(d, basis) 
return rd, rc 


Listing 2: Methods used in 


Listing 


copy-on-write semantics. 

While Python is designed as a general purpose language, the NumPy m package of¬ 
fers a multi-dimensional array class with similar feature set and performance as MATLAB 
matrices. Additional numerical algorithms and sparse matrix types can be found in the 
SciPy |20) package. Both packages are used extensively in pyMOR for all low-dimensional 
data structures as well as for pyMOR’s builtin discretization toolbox. 

pyMOR’s interfaces do not make any assumption on how the communication between 
pyMOR and the external solver is implemented by the interfacing classes, and many com¬ 
munication patterns are conceivable, such as disk-based communication via job and out¬ 
put files or network-based communication via a standard protocol such as xml-rpc or 
custom protocols. Being a long-running general purpose scripting language. Python is 
ideally suited for pyMOR’s interface-based approach, offering a large selection of extension 
packages for handling virtually any established input-output protocol. 

However, in spirit of the tight coupling between pyMOR and the external solver as pro¬ 
moted by our design, we favor, whenever possible, to integrate the solver by re-compiling 
it as a Python extension module, giving Python direct access to the solver’s data struc¬ 
tures. This design not only delivers maximum performance as no communication over¬ 
head is present, it also allows to directly manipulate the solvers state from within Python 
beyond the operations available via pyMOR’s interfaces. This allows easy exploration of 
new ideas and offers the user an interactive Python debugging shell with direct access to 
the solver’s memory. All external solvers in the following examples have been integrated 
with this approach, showing its feasibility, even for MPI-distributed solvers running on 
high-performance computing clusters. This approach has also been taken for the inte¬ 
gration of pyMOR with the BEST battery simulation code as part of the MULTIBAT project 

m- 
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def reduce_pod(d, reductor, product, snapshots, rb_size): 

training_set = d.paraineter_space. sample_uniforinly(snapshots) 

snapshots = d.operator.source.empty() 
for mu in training_set: 

snapshots.append(d.solve(mu)) 

basis, singular_values = pod(snapshots, modes=rb_size, product=product) 

rd, rc, _ = reductor(d, basis) 
return rd, rc 

def reduce_greedy (d, reductor, product, snapshots, rb_size): 

training_set = d.parameter_space.sample_uniformly(snapshots) 
ext_alg = partial(gram_schmidt_basis_extension, product=product) 
result = greedy(d, reductor, training_set, 

extension_algorithm=ext_alg, max_extensions=rb_size, 
pool=new_parallel_pool()) 

return result [’reduced_discretization’] , result [’reconstructor ’] 

def reduce_adaptive_greedy (d, reductor, product, validation_mus, rb_size): 
ext_alg = partial(gram_schmidt_basis_extension, product=product) 
result = adaptive_greedy(d, reductor, validation_mus=-validation_mus, 

extension_algorithm=ext_alg, max_extensions=rb_size, 
pool=new_parallel_pool()) 

return result [’reduced_discretization’] , result [’reconstructor ’] 


Listing 3: Further reduction methods in example.py. 


4.3. pyMOR by example. In the following, we consider a basic example of how four dif¬ 
ferent high-dimensional models of the form Q, implemented with pyMOR’s discretization 
toolkit, FEniCS )22j . deal. II [2] and the DUNE numerics environment mm, can be all 
reduced with pyMOR using identical reduction algorithms. 

Listing contains the typical workflow in a pyMOR application. A Discretization 
object holding the high-dimensional model is obtained first, here by by calling the 
pymor_model method. In addition, pymor_model returns an Operator representing the 
inner product on the space V. Next, a reductor for performing the actual RB projection 
is selected. Here, we choose the generic reduce_stationary_coercive method, which 
will also assemble the error estimator Q according to 1^. A ParameterFunctional which 
assigns to each ^ € V the coercivity estimate is provided. The reduced basis of size 10 
and the resulting reduced model are then created using the reduce_naive method, which 
returns the Discretization holding the reduced model (rd) along with a Reconstructor 
object which is able to perform the high-dimensional reconstruction ([^. Finally, the qual- 
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ity of the reduced model is evaluated using pyMOR’s reduction_error_analysis method, 
which computes the model reduction error and the error estimator effectivity for random 
parameters. 

Listingj^contains the pynior_model and reduce_naive methods used in Listing[^ The 
former uses pyMOR’s builtin discretization toolkit, hrst instantiating a problem description 
class (line 2) and then discretizing the problem using hrst order continuous hnite elements 
(line 3). We consider here a classic 2x2 ‘thermal block’ test problem of hnding solutions 
for the stationary diffusion eqnation 

-V • (a/,(x)Vu^(x)) = 1, xGO:=[0, 1]^ 

Ufiix) = 0, X G dD 

with diffusion coefficient given by the linear combination of indicator functions 

m n 

l^i,j ■ X[i/m,{i+l)/m]x[j/n,{j+l)/n]{x), (17) 

j=0 j=0 

m = n = 2, for parameters /J, G V := [0.1,We can think of this as computing 
the heat distribntion in a square material composed of 2 x 2 blocks of different thermal 
condnctivity while being nniformly heated and its temperatnre at the bonndary kept 
constant at a value of 0. 

The reduce_naive method simply computes the reduced basis by selecting a set of 
random parameters (line 6) for which the high-dimensional solution is computed and 
appended to the basis VectorArray (lines 7-9). 

Listings and already form a complete pyMOR application. However, to obtain 
good resnlts, more advanced basis generation techniques schonld be used instead of 
reduce_naive. Listing [^ contains three alternatives nsing POD, the basic greedy al¬ 
gorithm (Algorithmic and an extended adaptive version according to |16j . reduce_pod 
extends reduce_naive by computing a POD of the given solution snapshots (line 6). 
The reduce_greedy method mainly defers all work to pyMOR’s greedy implementation 
(lines 12-14) which has to be called with a training set of parameters (line 10) and 
a method for extending the existing rednced basis by the new solntion snapshot. For 
this stationary problem, we can simply choose gram_schmidt_basis_extension which 
orthonormalizes the new solution snapshot w.r.t. the old basis using a stabilized Gram- 
Schmidt process including re-orthonormalization for improved nnmerical accnracy. Sim¬ 
ilarly, reduce_adaptive_greedy defers all work to pyMOR’s adaptive_greedy method. 
Both algorithms are provided a new defanlt worker pool (new_parallel_pool) for auto¬ 
matic parallelization of the reduction error estimation (line 1 of Algorithm Q . 

All fonr rednction methods are included in the example. py script provided in the 
supplementary material, which contains a slightly extended version of Listing as main 
fnnction. In addition to pymor_model, three further high-dimensional models are avail¬ 
able (cf. Listing 1C : 

fenics_model rises the dolf in |23) Python modnle of the FEniCS project to discretize 
a 4 X 3 ‘thermal block’ problem (m = 4, n = 3) using higher order hnite 
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def fenics_model() : 

V, matrices, rhs, hl_mat 


# FEniCS code to setup discrete function space, 

# as well as system, inner product and rhs matrices 


from pymor.operators.fenics import FenicsMatrixOperator 
from pymor.vectorarrays.fenics import FenicsVector 

coeffs = [ProjectionParameterFunctional( ’diffusion’ , (4, 3), (3 - y - 1, x)) 
for X in range (4) for y in range (3)] 
ops = [FenicsMatrixOperator(m, V, V) for m in matrices] 
op = LincombOperator(ops, coeffs) 

rhs = VectorFunctional(ListVectorArray( [FenicsVector(rhs, V)])) 
hl_product = FenicsMatrixOperator(hl_mat, V, V, name=’hl_0_semi ’ ) 

param_space = CubicParameterSpace(op.parameter_type, 0.1, 1.) 
d = StationaryDiscretization(op, rhs, products={ ’hl_0_semi’ : hl_product}, 

parameter_space=param_space) 

return d, d.hl_0_semi_product 
def dealii_model() : 

from dealii_example import ElasticityExample 
example = ElasticityExample(refine_steps=9) 


from pydealii.pymor.operator import DealllMatrixOperator 
from pydealii.pymor.vectorarray import DealllVector 
coeffs = [ProjectionParameterFunctional( ’mu’ , tupleO), 

ProjectionParameterFunctionaK’lambda’, tupleO)] 
ops = [DealllMatrixOperator(example.mu_mat0), 

DealllMatrixOperator(example.lambda_mat())] 
op = LincombOperator(ops, coeffs) 

rhs = VectorFunctional(ListVectorArray([DealllVector(example.rhs())])) 

param_space = CubicParameterSpace(op.parameter_type, 1., 10.)) 
d = StationaryDiscretization(op, rhs, products={ "energy" : ops[0]}, 

parameter_space=param_space) 

return d, d.energy_product 
def dune_model() : 

from spelO import examples, wrapper 

example = examples [’aluconformgrid’][’pdelab’] [’ istl ’](’ [60 220 85]’, True) 
d = wrapper [example. discretizationO] 

param_ranges = {’blockade’: (le-4, 1), ’sink’: (0, 1)} 

param_space = CubicParameterSpace(op.parameter_type, ranges=param_ranges) 
d = d.with_(parameter_space=param_space) 
return d, d.energy_0_product.assemble(le-4) 


Listing 4: Further models in example.py (shortened). 
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Reduction errors / Reduced solution times 



Rfj, 11150,^1 



Figure 2: Error analysis of the RB approximation of fenics_model using 
reduce_greedy. Left: Maximum R^-model order reduction error (red, solid) 
on a test set of 1000 parameters and maximum estimated error (red, dashed) 
on the training set in dependence on the reduced basis {Strain,i ■= {0.1, 

O, Strain ,2 ■= {0.1,0.55,1}"^^^: +); average time for the computation of a 
reduced solution including error estimation in dependence on the reduced basis 
size (blue). Right: Plot of the absolute difference (dark: 0, light: 6.03 • 10“^) 
between the detailed and the reduced solution \u^ — ui 5 o,^| (for Strain, 2 ) for 
the worst approximated parameter of the test set. 


elements. The resulting matrix objects are then wrapped by pyMOR’s FenicsMatrix- 
Operator and FenicsVector classes which translate pyMOR interface calls into appropriate 
operations on the underlying FEniCS data structures (lines 8, 10, 11). The affine decom¬ 
position (|^ of the problem is encoded by dehning appropriate ParameterFunctionals 
and then constructing an operator representing the decomposition (lines 6, 7, 9). Finally, 
an appropriate parameter space is chosen, and everything is wrapped up in a generic 
StationaryDiscretizaion object (lines 12-14). Figure (left) shows model reduction 
errors and timings for f enics_inodel (second order hnite elements, 361.201 degrees of 
freedom) reduced by reduce_greedy using the training sets Strain,i ■= {0.1,1}^^^ and 
Strain,2 ■= {0.1,0.55,1}^^^. We observe that choosing a too small training set leads to 
overfitting: while the solution is very well approximated for parameters in the training 
set, the approximation quality is not maintained for parameters not contained in the 
training set. The error for the hnal basis size {Strain, 2 ) is shown in Figure]^ 

(right). 

In discretize_dealii, we consider the two-dimensional linear elasticity problem 

-VX{V ■U^,x)-{V ■ giV)u^,x-y ■ h{^u^,xf = f inF!=[0,l]2 (18) 

with Lame parameters //, A G [1,10] and homogeneous Dirichlet boundary conditions 
from the tutorial documentation for the deal. 110 j2] C++ solver library (cf. Figure]^ 

^https://dealii.org/8.1.0/doxygen/deal.11/step_8.html 
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Figure 3: Sample solutions and error analysis of the RB approximation of dealii_model 
using adaptive_greedy. Plot of displacement field for (/x, A) = (1,1) 
(left) and (/x, A) = (1,10) (middle) with color representing magnitude (blue 
= 0, red = 0.033). Right: Plot of maximum energy model reduction error (top, 
O) and error estimator effectivity (bottom, min: •, max: +) over a test set of 
10 random parameters against the size of the reduced basis ([0,10]). 


left, middle). The tutorial program was refactored into the ElasticityExample class and 
Python bindings for this class were added (line 17). Moreover, we implemented Python 
bindings for the deal. II SparseMatrix and Vector classes, as well as corresponding 
pyMOR wrapper classes. These are used to make the deal. II matrices and right-hand side 
vector provided by ElasticExample available as pyMOR Operators (lines 23, 24, 26). The 
ElasticityExample calculates a solution for the displacement held by discretizing 
(18) with hrst order continuous Galerkin hnite elements (65.536 degrees of freedom) 
and solves the resulting linear system with a conjugate gradient method. The Python 
bindings and pyMOR wrappers are available in the pymor-deal. II package. Figure]^ 
(right) shows model reduction errors and estimator effectivities for dealii_model reduced 
by adaptive_greedy for a reduced basis of size 10. 

dune_model uses the DUNE numerics environment to discretize a multi-scale single phase 
how problem with the highly heterogeneous and anisotropic SPEIO model2 permeability 
helc0 with inhow boundary conditions at xi = 0, a hxed pressure at xi = 5 and no how 
at the other boundaries: 


-V • (Kf^VUfj) = /^, in n := [0,2] x [0,5] x [0,1]. 


(19) 


The parameter p, allows to toggle the presence of a blockade at xi = 2.5 and a sink 
near the blockade (compare Figure]^ left and middle). Problem (19) is discretized with 
dune-gdt m using hrst order continuous hnite elements on a tetrahedral grid with 
6.7 • 10® elements (1.2 • 10® degrees of freedom). The integration with pyMOR is based on 
dune-pymor which automatically wraps the resulting discretization objects dehned in the 
dune-hdd module as pyMOR Discretizations (line 34) taking the affine decomposition of 
the problem into account. Only the parameter space is additionally chosen (lines 36-37). 


'http://www.spe.org/web/csp/datasets/set02.htm 
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Figure 4: Data function, sample solutions and adaptively refined set of training samples 
of dune_model using adaptive_greedy. Top center: logarithmic plot of the 
Frobenius norm of the anisotropic SPEIO model2 permeability tensor (dark: 
6.65 • 10“®, light: 2 • 10®). Left and bottom center: volume plot of the pressure 
and vector plot of the reconstructed Darcy velocity (colored and 

scaled by magnitude) for several parameters (blue: weak, red: strong): // = 
(10“^,0) bottom left, p, = (10“^, 1) top left and // = (1,0) bottom center. The 
first parameter component models the existence of a blockade in the middle 
(enabled: 10“^, disabled: 1) and the second parameter component acts as 
a switch for a sink near the blockade. Right: plot of the adaptively refined 
training set for a reduced basis of size 50 (colored contribution of the local 
error indicators, blue: low, red: high) and selected training parameters (white 
circles). Note the strong influence of the first parameter component (which 
influences the operator), as opposed to the second parameter component (which 
only influences the right hand side). 


Figure]^ shows the adaptively refined training set and selected training parameters for a 
reduced basis of size 50, which yields a maximum absolute model reduction energy error 
of 5.5 • 10“^*^ over a set of 100 randomly chosen test parameters. 

Again, let us note that all four high-dimensional models can be reduced with the 
exact same model reduction code, ranging from the trivial reduce_naive algorithm to 
the sophisticated adaptive snashpot selection in reduce_adaptive_greedy. In fact, the 
example. py script provided in the supplementary material allows to choose any of the 
16 possible combinations between PDF solver and reduction method. 


4.4. Parallelism in pyMOR. In many application areas of reduced order modeling, not 
only the efficiency of the reduced model but also the required time for generating the 
model needs to be take into account. Thus, RB software should be able to perform offline 
computations with good computational efficiency. For modern computing architectures 
this requires parallelization of algorithms. 

In a greedy basis generation algorithm (Algorithm Q , the main computational work 
is made up by three types of operations: 1. reduction error estimation on Strain, 2. com¬ 
putation of the solution snapshot u^*, 3. reduced basis extension and reduction of the 
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high-dimensional model. 

For the parallelization of the high-dimensional operations in steps 2 and 3, pyMOR re¬ 
lies on already existing, high-performance parallelization of the external solver. Since 
pyMOR’s interfaces require no communication of any high-dimensional data and are com¬ 
pletely implementation agnostic, memory distributed vector data can be handled via the 
VectorArray interface as efficiently as any non-distributed data. 

To ease the integration of MPI distributed codes, pyMOR offers tools which allow to au¬ 
tomatically support MPI parallel use of the solver when pyMOR bindings for the sequential 
case already exist. For instance, to parallelize f enics_inodel in Listing one would 
simply execute: 

d = mpi_wrap_discretization(lainbda: fenics_model() [0] , 

use_with=True, pickle_subtypes=False) 

prod = d. hl_0_senii_product 


When the script is executed with mpirun, an event loop is launched on all MPI ranks 
except for rank 0 which executes the main script. mpi_wrap_discretization then 
instructs each rank to execute the function given as hrst argument to obtain a local 
Discretization object. The Discretization returned by mpi_wrap_-discretization 
and all contained Operators will then use the event loop to issue MPI distributed op¬ 
erations on the rank-local objects when corresponding interface methods are called. In 


Section 5.2 we consider another example where pyMOR’s MPI wrappers are used. 

For the parallelization of step 1 and similar embarrassingly parallel tasks which require 
little to no communication, pyMOR provides an abstraction layer for existing Python par¬ 
allelization solutions based on a simple worker pool concept (pymor. parallel). For 
instance, line 1 in Algorithm could be parallelized by executing 

numpy.argmax(pool.map(lambda mu,rd: rd.estimate(rd.solve(mu), mu), 

training_set, rd=reduced_discretization)) 


The function and all arguments are automatically serialized and distributed to the work¬ 
ers of the pool, ensuring that immutable data is only communicated once. 

pyMOR currently provides a worker pool implementation based on the IPython |29) 
toolkit, which allows easy parallel computation with large collections of heterogeneous 
compute nodes, and an MPI-based implementation using pyMOR’s event loop which can 
seamlessly be used in conjunction with external solvers using the same event loop. The 
MPI-based worker pool was also used for the experiment in Figure]^ employing 16 cores 
of the compute server described in Section 5.1 For the larger training set Strain ,2 with 


531.441 parameters, the total offline computation time was 11 hours, of which 5 hours 
were spent on error estimation. Performing the error estimation without parallelization 
would have required an additional 37 hours of computation time. 


5. Performance evaluation. In this section we evaluate the applicability and per¬ 
formance of pyMOR’s design approach by considering technical benchmarks of some of 
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pyMOR’s interfaces, as well as a challenging nonlinear large-scale model order reduction 
problem. 


5.1. Benchmarks. The main goal of this section is to compare the performance of 
pyMOR’s VectorArray and Operator interfaces when used to access external high-dimen¬ 
sional solver data structures to native implementations of these classes. Moreover, we 
want to investigate possible performance benefits for vectorized VectorArray implemen¬ 
tations. 

As native implementations within pyMOR we consider NumpyVectorArray, which al¬ 
lows vectorized operations on vectors by internally holding an appropriately sized two- 
dimensional NumPy array, as well as ListVectorArray maintaining a Python list data 
structure holding vector objects implemented as one-dimensional NumPy arrays. The in¬ 
ner product in the pod benchmark is implemented with NumpyMatrixOperators holding 
sparse SciPy matrices coming from pyMOR’s own discretization toolbox. 

The external solver code is based on the DUNE numerics environment mi, centered 
around the discretization toolbox dune-gdt |31) (compare Section 4.3) and is compiled 
as a Python extension module as described in Section |4.2[ 

Our benchmarks were executed on a dual socket compute server equipped with two 
Intel Xeon E5-2698 v3 CPUs with 16 cores running at 2.30GHz each and 256GB of 
memory available. All benchmarks were performed as single-threaded processes. 

Vector array benchmark. We consider the axpy method of the VectorArray Interface, 
which performs a vectorized BLAS-conforming axpy operation, i.e. pairwise in-place ad¬ 
dition of the vectors in the array with vectors of a second array multiplied by a scalar 
factor. 

As we observe in Figure]^ (left), both ListVectorArray-based implementations show 
about the same performance for sufficiently large array dimensions. In fact, the DUNE- 
based implementation (dune-gdt, list based) actually performs better than the NumPy- 
based implementation (pyMOR, list based), showing the tight integration between pyMOR 
and the external solver. The NumpyVectorArray implementation, on the other hand, 
cannot benefit from vectorization for larger array dimensions 

POD benchmark. As detailed previously (e.g.. Section [3.1[ ), the Gram-Schmidt and 
POD algorithms are important tools in the context of model reduction. The implemen¬ 
tation of a numerically stable Gram-Schmidt or POD algorithm might not be completely 
straightforward and it is an important benefit that pyMOR’s interface design allows to 
provide tried and tested implementations of these algorithms which can automatically 
be used with any external solver integrated with pyMOR. 

pyMOR’s POD algorithm mainly consist of three steps with different complexities: 
1. computation of a Gramian matrix with respect to the given inner product (product. 
apply2), 2. computation of the eigenvalue decomposition of the Gramian (using the SciPy 
eigh method) and 3. mapping right-singular vectors to left-singular vectors (by calling 


^We assume that this is due to the fact that NumPy offers no native axpy operations, such that a 
large temporary array has to be created holding all to be added and scaled vectors. It is planned 
to improve performance of NumpyVectorArray.axpy by directly calling out to BLAS (cf. https: 
//github.com/pymor/pymor/issues/73). 
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Benchmark of VectorArraylnterf ace . axpy 


Benchmark of algorithms. pod 




Figure 5: Log/log plot of the measured execution time of A. axpy (left) and the POD 
algorithm (right) for different implementations and several lengths of the vec¬ 
tor array (len(A)==l: O, len(A)==4: A, len(A)==16: -t-, len(A)==64: •, 
len(A)==256: A). Left: Comparison of A. axpy (X) with len(X)==len(A) 
for several implementations of A. Right: Comparison of pod(A=A, modes=10, 
product==hl_0, orthonorinalize=False, check=False) for several imple¬ 
mentations of A and the hl_0 product (solid) and the respective time spent 
in scipy.eigh (dotted). 


lincomb on the original VectorArray). Note that the computational cost for steps 1 and 
3 depends on the VectorArray and Operator implementation, scaling linearly with the 
array dimension and quadratically (resp. linearly) with the array length. The computa¬ 
tional cost for step 2 is independent of the space dimension, and only increases with the 
number of given vectors. 

As the inner product for both benchmarks and all implementations we have chosen the 
full iV^-product matrix stemming from a first order continuous finite element discretiza¬ 
tion over the same structured triangular grid on the unit square. 

As we observe again in Figure]^ (right) both ListVectorArray-based implementation 
show roughly equal performance. However, the vectorized pyMOR implementation is able 
to clearly outperform both other implementations thanks to the fact that the computa¬ 
tionally dominant steps 1 and 3 of the algorithm can be expressed idiomatically via a 
single interface call. This shows that VectorArray implementations can indeed greatly 
benefit from pyMOR’s vectorized interface design. NumpyVectorArray, for instance, does so 
by calling NumPy’s dot method which is able to defer the task to highly optimized BLAS 
implementations. We expect similar performance benefits for external high-dimensional 
solvers, when consecutive-in-memory arrays of vectors are available as native data struc¬ 
tures inside theses solvers. 


5.2. A large, nonlinear problem. To evaluate pyMOR’s ability to handle large-scale 
problems, we consider a three-dimensional version of the Burgers-type problem already 
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Table 1: Time in seconds needed for the solntion of (20) for a single parameter, nsing 


standalone DUNE code in comparison to DUNE code with time-stepping in pyMOR 
(best of 3 rnns). 


MPI ranks 

1 

2 

3 

6 

12 

24 

48 

96 

192 

DUNE 

16858 

8532 

5726 

2959 

1526 

773 

396 

203 

107 

pyMOR 

17683 

8940 

6050 

3124 

1604 

815 

417 

213 

no 

overhead 

4.9% 

4.8% 

5.7% 

5.6% 

5.1% 

5.4% 

5.3% 

4.9% 

2.8% 


discnssed in m, i.e., we solve the scalar conservation law 


x) -h Va, • (v • u{t, xY) = 0, X G [0, 2] x [0, if ,t G [0,0.3] 
rt^(0,x) = -(1-|-sm(27rxi) sm(27rx2) sin(27rx3)) 


( 20 ) 


for exponents G [1,2], periodic bonndary conditions and constant transport direction 
V = (1,1,1) (cf. Fignre]^. The problem was discretized nsing a hnite volnme scheme on 
a 480 X 240 x 240 voxel grid with 27.6 million degrees of freedom. 

To keep the implementation, which is available in the snpplementary material, as 
simple as possible, we chose a basic Lax-Friedrichs nnmerical flnx with explicit Enler 
time discretization nsing 600 eqnidistant time steps. Onr MPI-parallel code only depends 
on the DUNE grid interface and inclndes hand-written pyMOR bindings ntilizing the MPI 


helper classes and event loop covered in Section 4.4 


To evalnate the performance of the integration with pyMOR, we compared the solntion 


time for (20) nsing a standalone version of the solver to the time needed with time¬ 
stepping done by pyMOR’s explicit_euler time-stepper (Table Q. We observe that 
the pyMOR integration shows very good performance with only small overhead compared 
to the native DUNE version. All compntations were performed on 1 to 16 nodes of the 
University of Munster’s PALMA computing cluster. Every node contains 48GB of main 
memory and two hexa-core Intel Xeon E5650 CPUs. 

For the model order reduction we used the EI-Greedy m algorithm to generate the 
interpolation data for the nonlinear space differential operator and a simple POD for the 
computation of the reduced basis. In both cases, solution trajectories for 10 equidistant 
parameters were chosen as input which had each been compressed beforehand using 
additional PODs with a relative tolerance of 10“^. Thanks to the parallelization of the 
high-dimensional discretization, the offline phase of the experiment could be completed 
in only 3.4 hours. 

Figure summarizes the approximation quality of the reduced model for different 
reduced basis sizes and numbers of interpolation points. In particular we observe that 
for a reduced basis of size 80 and 300 interpolation points, we can achieve a relative 
L°° — LP' model reduction error of 2.6 • 10“^. A solution for this reduced model takes 
on average 2.8 seconds using a single processor core, resulting in a speedup of 38 in 
comparison to a solution of the high-dimensional model using all 192 cores or a speedup 
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Figure 6: Left: Plot of the solution of (20) for /r = 2 at final time t = 0.3. Only values 
in the intervals [0,0.4] (blue) and [0.6,1] (red) are displayed. Right: Model 
reduction errors for RB approximation of (20). Maximum L°° — L^-error on 
a test set of 10 random parameters for different reduced basis sizes N and 
numbers of interpolation points M. 


of 6000 in comparison to a single-core computation. 


Code availability. pyMOR and all further code used for the production of the results 
in this work are available under open source licenses. The specific versions used here 
are included in the supplementary material. Current versions of pyMOR, including the 
wrapper classes for FEniCS, deal. II and DUNE, as well as the code for the example in 
Section 5.2 can be found at http://www.pymor.org/ 
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